Chinese Numbers, MIX, Scrambling, and Concatenation Grammars Range
نویسنده
چکیده
The notion of mild context-sensitivity was formulated in an at tempt to express the formal power which is both necessary and sufficient to define the syntax of natural languages. However, some linguistic phenomena such as Chinese numbers and German word scrambling lie beyond the realm of mildly contextsensitive formalisms. On the other hand, the class of range concatenation grammars provides added power w.r.t, mildly context-sensitive grammars while keeping a polynomial parse time behavior. In this report, we show that this increased power can be used to define the abovementioned linguistic phenomena with a polynomial parse time of a very low degree. 1 M o t i v a t i o n The notion of mild context-sensitivity originates in an attempt by [Joshi 85] to express the formal power needed to define the syntax of natural languages (NLs). We know that contextfree grammars (CFGs) are not adequate to define NLs since some phenomena are beyond their power (see [Shieber 85]). Popular incarnations of mildly context-sensitive (MCS) formalisms are tree adjoining grammars (TAGs) [Vijay-Shanker 87] and linear context-free rewriting (LCFR) systems [Vijay-Shanker, Weir, and Joshi 87]. However, there are some linguistic phenomena which are known to lie beyond MCS formalisms. Chinese numbers have been studied in [Radzinski 91] where it is shown that the set of these numbers is not a LCFR language and that it appears also not to be MCS since it violates the constant growth property. Scrambling is a word-order phenomenon which also lies beyond LCFR systems (see [Becket, Rambow, and Niv 92]). On the other hand, range concatenation grammar (RCG), presented in [Boullier 98a], is a syntactic formalism which is a variant of simple literal movement grammar (LMG), described in [Groenink 97], and which is also related to the framework of LFP developed by [Rounds 88]. In fact it may be considered to lie halfway between their respective string and integer versions; RCGs retain from the string version of LMGs or LFPs the notion of concatenation, applying it to ranges (couples of integers which denote occurrences of substrings in a source text) rather than strings, and from their integer version the ability to handle only (part of) the source text (this later feature being the key to tractability). RCGs can also be seen as definite clause grammars acting on a flat domain: its variables are bound to ranges. This formalism, which extends CFGs, aims at being a convincing challenger as a syntactic base for various tasks, especially in natural language processing. We have shown that the positive version of RCGs, as simple LMGs or integer indexing LFPs, exactly covers the class PTIME of languages recognizable in deterministic polynomial time. Since the composition operations of RCGs are not restricted to be linear and non-erasing, its languages (RCLs) are not semi-linear. Therefore, RCGs are not MCS and are more powerful than LCFR systems, while staying computationally tractable: its sentences can be parsed in polynomial time. However, this formalism shares with LCFR systems the fact that its derivations are CF (i.e. the choice of the operation performed at each step only depends on the object to be derived from). As in the CF case, its derived trees can be packed into polynomial sized parse forests. For a CFG, the components of a parse forest are nodes labeled by couples (A, p) where A is a nonterminal symbol and p is a range, while for an RCG, the labels have the form (A, p-') where # is a vector (list) of ranges. Besides its power and efficiency, this formalism possesses many other attractive proper-
منابع مشابه
Tree-Local Multicomponent Tree-Adjoining Grammars with Shared Nodes
This article addresses the problem that the expressive power of tree-adjoining grammars (TAGs) is too limited to deal with certain syntactic phenomena, in particular, with scrambling in freeword-order languages. The TAG variants proposed so far in order to account for scrambling are not entirely satisfying. Therefore, the article introduces an alternative extension of TAG that is based on the n...
متن کاملDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an...
متن کاملTuLiPA - Parsing Extensions of TAG with Range Concatenation Grammars
In this paper we present a parsing framework for extensions of Tree Adjoining Grammars (TAG) called TuLiPA (Tübingen Linguistic Parsing Architecture). In particular, besides TAG, the parser can process Tree-Tuple MCTAG with shared nodes (TT-MCTAG), a TAG-extension that has been proposed to deal with scrambling in free word order languages such as German. The central strategy of the parser is su...
متن کاملFrom Contextual Grammars to Range Concatenation Grammars
Though the field of natural language processing is one of the major aims that has led to the definition of contextual grammars, very little was made on that subject. One reason is certainly the lack of efficient parsers for contextual languages. In this paper we show how some subclasses of contextual grammars can be translated into equivalent range concatenation grammars and can thus be parsed ...
متن کاملAn Earley Parsing Algorithm for Range Concatenation Grammars
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation help...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999